# mean (years)
mapping <- Fatalities %>% select(state,year,pop,fatal) %>% mutate(state=toupper(state))
map <- mapping %>% group_by(state) %>% summarise(pop=mean(pop),fatal=mean(fatal)) 
## `summarise()` ungrouping output (override with `.groups` argument)
map$hover <- with(map, paste(state, '<br>', "pop", pop, "fatal", fatal))

# give state boundaries a white border
l <- list(color = toRGB("white"), width = 2)
# specify some map projection/options
g <- list(
  scope = 'usa',
  projection = list(type = 'albers usa'),
  showlakes = TRUE,
  lakecolor = toRGB('white')
)

fig <- plot_geo(map, locationmode = 'USA-states')
fig <- fig %>% add_trace(
    z = ~fatal, text=~hover,locations = ~state,
    color = ~fatal, colors = 'Purples'
  )
fig <- fig %>% colorbar(title = "Number of vehicle fatalities")
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
fig <- fig %>% layout(
    title = 'US Traffic Fatalities',
    geo = g,
  yaxis = list(type = "log")
  )
fig
# by year
gg <- ggplot(mapping, aes(pop,fatal, color = state)) +
  geom_point(aes(size = fatal, frame = year, ids = state)) +
  scale_x_log10()
## Warning: Ignoring unknown aesthetics: frame, ids
ggplotly(gg)

Note: - The report might be shown to the class as examples if it is done very well or very badly, with your identity redacted. - Each student needs to write and submit the report on their own for Project 1. - Remove these remarks in your submission.


1 Introduction

In this section, state the questions of interest, motivation of this analysis, and potential impact of your results. You can simply rephrase the Project Description for minimal efforts. You can also cite published papers or credible articles on the Internet. For instance, you may find this brief very relevant. More can be found by searching the key words “class size”, “education”, “performance.” See, among others,here for proper citation formats.

2 Background

In this section, explain
the source of data, target population, sampling mechanism, and variables in this data set. You can briefly review existing research or known results, which will help you in the analysis. You can find the data set from many sources, e.g., the AER package, Harvard dataverse. Both links provides information about this dataset. The brief mentions in previous section is also a good reference to read. You can find more by searching the key word “Project STAR” in, e.g., Google scholar.

3 Descriptive analysis

Select the variables you find relevant based on your understanding in the Background section. Summarize univariate descriptive statistics for the selected variables (mean, standard deviations, missing values, quantiles, etc.). You can create the table using functions in base R, or use packages (see, e.g., this note).

From the data set, we can easily notice that various number of students are assigned to each teacher. In order to obtain one summary measure with teacher as the unit, we need to aggregate students’ performance (their math scores in 1st grade).

Multivariate descriptive statistics for the outcome (the chosen summary measure for each teacher) with key variables (e.g., class types, school IDs).

4 Inferential analysis

We can define a two-way ANOVA model as follows \(Y_{ijk} = \mu_{..} + \alpha_{i} + \beta_{j} + \epsilon_{ijk}\), where the index \(i\) represents the class type: small (\(i=1\)), regular (\(i=2\)), regular with aide (\(i=3\)), and the index \(j\) represents the school indicator. You need to explain the rest of the parameters, state constraints on the parameters, and justify the choice of model (e.g., why no interaction terms).

The proposed model is a two-way ANOVA model. You can find the assumptions easily from the course notes or read the wiki page on ANOVA. State these assumptions and try to explain them in the context of Project STAR. You can find assumptions for the regression model in a similar manner.

You can fit the ANOVA model using aov() in R (or lm() for the regression version). Report the fitted results with some attention on how/whether to report the estimated coefficients for school IDs.

The null hypothesis for the primary question of interest is \(H_0 : \alpha_1 = \alpha_2 = \alpha_3 = 0\), and the alternative is \(H_a\) : not all \(\alpha\)s are zero. You can find the test statistic and p-value using summary(anova.fit), if you save your fitted model as anova.fit. Please be sure specify the significance level and interpret your test result. Explain any additional assumptions involved in this test.

For the secondary question of interest, one option is the Tukey’s range test ( link). Again, specify the significance level, interpret your test result, and explain any additional assumptions involved in this test.

5 Sensitivity analysis

6 (Optional) Causal interpretation

As an example for the creativity category in the grading rubric, you can investigate the plausibility of making causal statements. such as smaller classes sizes lead to better performance. Discuss the assumptions for causal interpretation and whether they are plausible in Project STAR. See, for instance, Chapter 9 in Imbens and Rubin (2015).

7 Discussion

Conclude your analysis in this section. You can touch on the following topics.

Acknowledgement

By default, it is assumed that you have discussed this project with your teammates and instructors. List any other people that you have discussed this project with.

Reference

List any references you cited in the report. See here for the APA format.

Imbens, G., & Rubin, D. (2015). Stratified Randomized Experiments. In Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction (pp. 187-218). Cambridge: Cambridge University Press. doi:10.1017/CBO9781139025751.010

Session info

Report information of your R session for reproducibility.

sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] forcats_0.5.0   stringr_1.4.0   dplyr_1.0.2     purrr_0.3.4    
##  [5] readr_1.4.0     tidyr_1.1.2     tibble_3.0.4    tidyverse_1.3.0
##  [9] plotly_4.9.2.1  ggplot2_3.3.2   AER_1.2-9       survival_3.2-7 
## [13] sandwich_3.0-0  lmtest_0.9-38   zoo_1.8-8       car_3.0-10     
## [17] carData_3.0-4  
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.4.2         jsonlite_1.7.1     viridisLite_0.3.0  splines_4.0.3     
##  [5] modelr_0.1.8       Formula_1.2-4      assertthat_0.2.1   cellranger_1.1.0  
##  [9] yaml_2.2.1         pillar_1.4.7       backports_1.2.0    lattice_0.20-41   
## [13] glue_1.4.2         digest_0.6.27      RColorBrewer_1.1-2 rvest_0.3.6       
## [17] colorspace_2.0-0   htmltools_0.5.0    Matrix_1.2-18      pkgconfig_2.0.3   
## [21] broom_0.7.2        haven_2.3.1        scales_1.1.1       openxlsx_4.2.3    
## [25] rio_0.5.16         farver_2.0.3       generics_0.1.0     ellipsis_0.3.1    
## [29] withr_2.3.0        lazyeval_0.2.2     cli_2.2.0          magrittr_2.0.1    
## [33] crayon_1.3.4       readxl_1.3.1       evaluate_0.14      fs_1.5.0          
## [37] fansi_0.4.2        xml2_1.3.2         foreign_0.8-80     tools_4.0.3       
## [41] data.table_1.13.2  hms_0.5.3          lifecycle_0.2.0    munsell_0.5.0     
## [45] reprex_1.0.0       zip_2.1.1          compiler_4.0.3     rlang_0.4.8       
## [49] grid_4.0.3         rstudioapi_0.13    htmlwidgets_1.5.2  crosstalk_1.1.0.1 
## [53] labeling_0.4.2     rmarkdown_2.5      gtable_0.3.0       abind_1.4-5       
## [57] DBI_1.1.0          curl_4.3           R6_2.5.0           lubridate_1.7.9.2 
## [61] knitr_1.30         stringi_1.5.3      Rcpp_1.0.5         vctrs_0.3.5       
## [65] dbplyr_2.0.0       tidyselect_1.1.0   xfun_0.19